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Abstract: We propose an algorithmic strategy for improving the efficiency of Monte 
Carlo searches for the low-energy states of proteins. Our strategy is motivated by 
a model of how proteins alter their shapes. In our model when proteins fold under 
physiological conditions, their backbone dihedral angles change synchronously in groups 
of four or more so as to avoid steric clashes and respect the kinematic conservation 
laws. They wriggle; they do not thrash. We describe a simple algorithm that can be 
used to incorporate wriggling in Monte Carlo simulations of protein folding. We have 
tested this wriggling algorithm against a code in which the dihedral angles are varied 
independently (thrashing). Our standard of success is the average root-mean-square 
distance (rmsd) between the a-carbons of the folding protein and those of its native 
structure. After 100,000 Monte Carlo sweeps, the relative decrease in the mean rmsd, 
as one switches from thrashing to wriggling, rises from 11% for the protein 3LZM with 
164 amino acids (aa) to 40% for the protein lAlS with 313 aa and 47% for the protein 
16PK with 415 aa. These results suggest that wriggling is useful and that its utility 
increases with the size of the protein. One may implement wriggling on a parallel 
computer or a computer farm. 
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1. Why Proteins Wriggle 

We propose an algorithmic strategy for improving the efficiency of Monte Carlo searches 
for the low-energy states of proteins. Our strategy is motivated by a model in which 
proteins alter their shapes by means of local motions that minimize the displacement 
of distant atoms. 

Folding proteins avoid steric clashes and respect the kinematic conservation laws. 
The system consisting of a protein and the nearby solvent molecules approximately 
conserves its energy, momentum, and angular momentum. The shape of a protein is 
mainly defined by the angles of rotation, (pi and ipi, about the backbone bonds that 
link the a-carbons to the adjacent amide planes. A change in a one of these dihedral 
angles would rotate a significant part of the protein molecule, moving each atom by a 
length proportional to its distance from the axis of rotation. In a protein consisting of 
hundreds or thousands of amino acids in water, such a rotation would engender steric 
clashes and grossly violate the kinematic conservation laws. Instead when a protein 
folds or unfolds in our model, its backbone dihedral angles conspire in groups of four 
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or more to change in ways that hmit their displacement of distant atoms. Proteins 
wriggle; they do not thrash. 

Localized motions of the protein backbone involve at least four bonds, but simpler 
local motions are possible in simpler systems. In lattice models all local motions, such as 
corner moves and crankshaft moves are localized. Polymers also possess simple local 
motions in the continuum. In polyethylene, for instance, two backbone bonds separated 
by a trans bond are parallel, and so equal and opposite rotations about these parallel 
bonds constitute a motion that is localized. Such crankshaft moves in polymers have 
been seen in simulations guided by brownian and molecular dynamics 0, |^, |, |], ^ || . 

Continuum Monte Carlo searches strike a balance between the temporal detail of 
molecular dynamics and the rigidity of the lattice. They are defined by their kinematics 
(how the proteins move) and their dynamics (why they move). By kinematics we mean 
the variables that define the state of the protein and the kinds of Monte Carlo moves 
that are permitted. The dynamics is determined by the energy function. Monte Carlo 
simulations do not incorporate wriggling in their kinematics; this paper is about how 
they could and whether they should. 

Our wriggling algorithm is based upon the linear dependence of every quartet of 
three-dimensional vectors. In the next section, we use this linear dependence to show 
that one may choose the four angles of rotation about any four backbone bonds so that 
the combined motion of the protein is localized. A simple computer algorithm that 
may be used to incorporate wriggling in Monte Carlo searches is outlined in section 3. 

We have run three simple tests to determine whether wriggling actually improves 
the efficiency of a Monte Carlo search; we describe these tests and their results in 
section 4. In each test we compared simulations guided by the wriggling algorithm to 
ones guided by a standard thrashing algorithm in which the dihedral angles are varied 
independently. In order to separate the kinematics of folding from the dynamics of 
folding, we used a nearly perfect but artificial energy function that is proportional to 
the root-mean-square distance (rmsd) of the folding a-carbons from the a-carbons of 
the native structure. Because there is no simple relation between this rmsd and an 
energy, and because the use of wriggling alters the effective temperature, we performed 
our Monte Carlo runs at absolute zero. On each protein we performed two sets of eight 
or more runs of 100,000 sweeps, one set controlled by the wriggling algorithm and the 
other by a thrashing algorithm. The runs started from fully denatured random coils. 
We averaged the final rmsd's. The relative decrease in the average final rmsd as one 
switches from the thrashing code to the wriggling code ((rmsd)th — (rmsd)„r) / (rmsd)wr 
is a measure of the utility of wriggling. This wriggling advantage rose from 11% for 
the protein 3LZM with 164 amino acids (aa), to 40% for the protein lAlS with 313 
aa, and to 47% for the protein 16PK with 415 aa. The advantage of wriggling seems 
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to grow with the length of the protein. 

In section 5 we sketch how one might implement wriggling with a realistic energy 
function on a parallel computer or on a farm of computers. We summarize the present 
work and mention some of its limitations in section 6. 



2. How Proteins Wriggle 

Three-dimensional space is spanned by any three linearly independent vectors. Every 
quartet of three-dimensional vectors is linearly dependent — that is, for every four 
three-dimensional vectors Vi, V2, v^, v^, there exist four numbers Xi,X2, x^, X4 such that 
the weighted sum of the vectors vanishes, 

4 

Y^^i^i-O- (2-1) 

i=l 

In this section we use these mathematical facts to show that one always may choose 
the angles of rotation about any four bonds so as to minimize the net effect of the four 
rotations upon distant atoms. 

The change dr in the position f of an atom due to a rotation by a small angle e 
about a bond axis taken to be a unit vector b is the cross-product of eb with the vector 
from any point c on the axis to the point r, 

dr = ebx{f-c}. (2.2) 

To first order in e, the net displacement dr of the position r of an atom due to four 
rotations by the small angles about the bonds 6j for i = 1, 2, 3, 4 is the sum 

4 4 

dr = ^dri = ^ (^eik x (f- q)^ . (2.3) 

i=l i=l 

If we use a for the average of the four points q, which typically would be the midpoint 

— * 

between two a-carbons, then we may express the net displacement dr as 



dr 



4 / \ 

(^eik X (f- d+d-Ci)^ = l^eibi J x(f-a)+^ (^eA x (a - q)^ . (2.4) 

1=1 \i=l / 1=1 



The displacement dr will be independent of the potentially large moment arm f — a 
(for every atom) if the sum of the bond vectors bi weighted by their angles vanishes. 
That is, the net displacement dr is merely 

4 

dr = Y (eik X (a - q)) , (2.5) 



i=l 
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which is independent of r* — a, if the angle- weighted sum of bond vectors vanishes, 

4 

5^ei&. = 0. (2.6) 

i=l 

And because every quartet of three-dimensional vectors is linearly dependent ( p.lD , it 
is always possible to choose the four small angles so that this sum vanishes for any 
four bond vectors bj. 



3. A Wriggling Algorithm 

In this section we show how to transform the wriggling condition ( |2.6[ ) into a matrix 
equation that can be solved by standard linear-algebra software, such as the freely 
available LAPACK § subroutine DGESV . 



The wriggling condition (2^) can be written in the more explicit form 



^ fem (-en/e4) = &i4 (3.1^ 



n=l 



for i = 1, 2, 3. Let us arrange the first three axes bi, 62, and 63 into the matrix A with 
elements Aj„ = 6j„ for i,n = 1, 2, 3 and rename the fourth axis 64 as the vector B with 
components = b^. If we now use X for the vector with components X„ = — e„/e4 
for n = 1, 2, 3, then the wriggling condition (p.l|) becomes 



J2AnXn = B, (3.2) 



n=l 

for i = 1, 2, 3. 

The LAPACK subroutine DGESV is designed to solve such linear equations. The 

call 

call DGESV ( 3, 1, A, 3, ipiv, B, 3, info ) (3.3) 

returns the three angle ratios X„ = — e„/e4 for n = 1,2,3 as the three components 
Bn of the vector B. The value info = indicates that the computation is successful, 
and ipiv contains pivot indices, which may be ignored. We set = —1 so that 
e„ = —£4 Bn = Bn for n = 1, 2, 3 and then normalize the four angles 



n=l 
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Lastly we multiply them by a random number x drawn uniformly from the interval 
(—0.0125, 0.0125) radians, so that our final angles are 6'„ = xe„ for n = 1, 2, 3, 4. 

Although we used the linear equation ( p.l|) as our wriggling condition, we used 
the exact and general form ( [A.8D of the rotation matrix described in the appendix to 
implement all rotations. 

4. Does Wriggling Work? 

In order to test the utility of our wriggling algorithm, we performed Monte Carlo simu- 
lations of protein folding on three proteins: T4 lysozyme (3LZM.pdb, 164 aa), ornithine 
carbamoyltransferase (lAlS.pdb, 313 aa), and phosphoglycerate kinase (16PK.pdb, 415 
aa). 

We used an artificially nearly perfect energy function that is proportional to the 
root-mean-square distance (rmsd) between the a-carbons of the folding protein and 
those of its native structure. This nearly perfect energy function allows us to separate 
the kinematics of folding (the Monte Carlo moves — wriggling or thrashing) from the 
dynamics of folding (the mechanisms in the energy function — conformational entropy, 
charge-charge interactions, hydrogen bonds, van der Waals interactions, hydrophobic- 



ity ITT], |T2[)- Because there is no simple relationship between the a-carbon rmsd and 
an energy, and because wriggling changes the effective temperature, we conducted our 
simulations at absolute zero rather than at physiological temperatures or at that of 
hquid nitrogen. 

Each of our tests consisted of 8 or 10 pairs of Monte Carlo runs, one guided by the 
wriggling algorithm and the other by a thrashing algorithm. Each run began with a 
random coil and ran for 100,000 sweeps, each sweep being a sequence of applications 
of the algorithm successively along the primary structure of the protein. The wriggling 
code applies the algorithm described in Eqs. (p.l|- p.4D to successive quartets of dihedral 
angles. After each wriggle the code performs a Metropolis step; since the temperature 
is zero, the wriggle is accepted if and only if it lowers the rmsd. In each sweep the 
wriggling code wriggles firstly the four dihedral angles 02, 4'2, 03, "03 of residues 2 and 3, 
secondly the angles -02, 03, "03, 04 of residues 2, 3, and 4, thirdly the angles 03, -03, 04, -04 
of residues 3 and 4, and continues in this way down the chain to the penultimate 
residue. The code does not vary the angle of any proline residue. To keep these angles 
fixed, the code performs one of several procedures when one or more proline residues is 
involved in a wriggle. In each sweep the thrashing code successively and independently 
changes, by a random angle 56 drawn uniformly from the interval (—0.0125, 0.0125) 
radians, every dihedral angle from the first ip to the last 0, except for the 0's of the 
prolines; it accepts each change if and only if the change lowers the rmsd. Apart from 
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Figure 1: For the protein lAlS, the lines trace the values of the rmsd for 8 runs guided by 
the wriggling algorithm and 8 guided by the thrashing algorithm. 

end effects, the thrashing code makes the same number of Monte Carlo judgments per 
sweep as does the wrigghng code. 

In our first test of wriggling, we constructed 8 fully denatured random coils of the 
protein 3LZM, which has 164 residues. The rmsd's of these denatured configurations 
ranged from 18.2 to 128.5 A. In 8 runs of 100,000 sweeps, the average rmsd was 
1.46 ± 0.07 A for the wriggling code and 1.62 ± 0.05 A for the thrashing code. The 
mean thrashing rmsd was 11% larger than the mean wriggling rmsd. 

We performed our second test of wriggling on the protein lAlS, which has 313 
residues. Our 8 denatured coils of lAlS had rmsd's running from 25.9 A to 251.5 
A. The rmsd's of the 16 wriggling and thrashing runs are plotted in Fig. |I[ After 
about 40,000 sweeps, the thrashing runs separate out into a cluster of lines, labeled 
as thrashing, that lie distinctly above the wriggling runs, labeled as wriggling. The 8 
wriggling runs had an average final rmsd of 1.67 ±0.04 A, while that of the 8 thrashing 
runs was 2.33 ± 0.03 A or 40% greater. 

For our third test of wriggling, we first randomized and stretched the native struc- 
ture of the protein 16PK, which has 415 (visible) residues, into 10 fully denatured 
coils with rmsd's running from 24.7 A to 341.8 A. We then allowed our wriggling 
and thrashing codes to reduce the rmsd's of these 10 random coils in runs of 100,000 
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Figure 2: For the protein 16PK, the hnes trace the values of the rmsd for 10 runs guided by 
the wriggling algorithm and 10 guided by the thrashing algorithm. The plot labeled by the 
letter "t" is a successful thrashing outlier. 

sweeps. The rmsd's of the 20 runs are plotted in Fig. |^ as a function of sweep number. 
Apart from one thrashing run, the rmsd's of the wriggling runs drop below those of the 
thrashing runs after about 30,000 sweeps. The outlying thrashing run, labeled by the 
letter "t," did slightly better than the two worst wriggling runs. After 100,000 sweeps, 
the average rmsd of the wriggling runs was 1.66 ± 0.03 A, while that of the thrashing 
runs was 2.44 ± 0.08 A. The mean rmsd of the thrashing code was 47% greater than 
that of the wriggling code. 

One estimate of the utility of wriggling is the relative decrease in the average final 
rmsd as one switches from the thrashing code to the wriggling code. 



In these three tests, the utility of wriggling increased with the size of the protein, rising 
from an advantage 11% at 164 aa to 40% at 313 aa and 47% at 415 aa. The longer 
the protein, the larger are the motions that occur when the backbone-bond angles are 
varied one at a time, and so the greater are the need for and the advantage of wriggling. 

Each of the proteins lAlS and 16PK has two domains. Is the advantage of wriggling 
over thrashing in these two cases due merely to a better twist in the polypeptide strand 



(rmsd)th — (rmsd) 



wr 



(4.1) 



(rmsd) 
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that connects the two domains? To answer this question, we measured the rmsd's of 
the individual domains of the final configurations of lAlS and 16PK given by the 
wriggling and thrashing codes. The average rmsd's of the first and second domains 
of lAlS respectively were 1.78 and 1.56 A with wriggling and 2.05 and 2.57 A with 
thrashing. Those of 16PK were 1.52 and 1.79 A with wriggling and 2.43 and 2.67 A 
with thrashing. These results suggest that wriggling gives better domains, not just 
better connecting strands. 

The wriggling code differs from the thrashing code in two respects: its basic moves 
are four rotations rather than a single rotation and large motions of remote atoms 
are suppressed. To evaluate the two effects separately, we wrote a code in which 
the elemental moves are groups of four rotations but in which no wriggling condition 
is imposed. We let this coordinated-thrashing code fold our ten denatured starting 
configurations of the protein 16PK and found after 100,000 sweeps that the average 
rmsd was 1.88 ± 0.04 A which is to be compared with 1.66 ± 0.03 A for the wriggling 
code and 2.44 ± 0.08 A for the thrashing code. So wriggling is better than coordinated 
thrashing and much better than thrashing, but part of the success of wriggling arises 
from the coordination of its compound elemental moves. 

Because of our use of the rmsd as an artificially nearly perfect energy function, 
the proteins of our simulations are phantoms; they can move through each other. A 
real but approximate energy function would reject all moves into excluded volume; it 
therefore would reject many thrashing moves because of their large-scale motions. The 
use of the rmsd in our three tests deprives wriggling of one of its key advantages over 
thrashing and over coordinated thrashing, namely that its localized motions are less 
likely to involve steric clashes. Thus the utility of wriggling in simulations with real 
energy functions may be greater than is indicated by these tests. 

The wriggling code runs somewhat more slowly than the thrashing code. But 
a realistic energy function would slow down both codes by so much that the speed 
advantage of thrashing would be negligible. 

In the first 10,000 sweeps of our tests of the wriggling algorithm, the thrashing 
code reduced its rmsd's more quickly than the wriggling code. It might therefore be 
worthwhile to experiment with codes that relax the wriggling condition for the first 
10,000 sweeps or that mix coordinated thrashing with wriggling. 

The action of a wriggle is much less than that of a thrash; the action of a small-angle 
wriggle may be as little as lOh. So quantum-mechanical effects are more important 
with wriggling than with thrashing, but even so they probably would be obscured by 



decoherence [13 . 
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5. Wriggling on a Parallel Computer 



Most of the residues of a folded protein lie in alpha helices and beta sheets. The 
secondary structure of a protein is the assignment of residues to helices, sheets, turns, 
and coils. One may list the possible secondary structures of a protein and assign one 
secondary structure to each processor of a parallel computer or computer farm. Each 
processor would perform Monte Carlo moves on the dihedral angles of the residues in 
the turns and coils of its secondary structure but would leave invariant the dihedral 
angles of its helices and sheets. Wriggling should be used in the coils and in turns 
longer than 4 or 5 aa, but probably not in turns of 3 or 4 aa, where it might overly 
constrain the folding of the protein. Because each processor would vary the dihedral 
angles (and possibly the principal side-chain angles) only of the residues in the coils 
and turns, the simulation would run quickly enough to be guided by a reahstic energy 
function 111, |I5|, |l6l O with solvation and excluded volume. At the end of a run of 



perhaps 100,000 sweeps, the final energies of the different secondary structures would be 
compared and their folds stored. Many runs would be required to test all the plausible 
secondary structures. This implementation of wriggling would make optimum use of 
a parallel computer or of a computer farm; no time would be lost to inter-processor 
communication or to waiting. 



6. Conclusions and Caveats 



We have described and tested an algorithmic strategy for improving the efficiency of 
Monte Carlo searches for the low-energy states of proteins. Our strategy is motivated 
by a model in which the laws of physics constrain the incremental motions of proteins 
to be essentially local. Localized motions can be incorporated in Monte Carlo searches 
by a simple algorithm that rotates the dihedral angles in groups of four. To test this 
wriggling algorithm, we performed 52 zero-temperature, 100,000-sweep, Monte Carlo 
searches for the low-energy states of the proteins 3LZM, lAlS, and 16PK using the 
rmsd as an artificially nearly perfect energy function. The searches guided by the 
wriggling algorithm reached lower rmsd's than those guided by the usual thrashing 
algorithm by a margin that increased with the length of the protein. But it remains 
to be seen whether and how this wriggling algorithm might improve the efficiency of 
Monte Carlo searches performed at finite temperature and guided by an approximate, 
realistic energy function with solvation and excluded volume. 
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A. Rotation Matrices 

For the sake of completeness, we derive in this appendix an exact formula for the general 
rotation matrix from the expression (|2.2|) for an infinitesimal rotation. 



To simplify the notation, we shall consider an axis that runs through the origin 
and choose c = 0. In this case by Eq.(|2.2|), a right-handed rotation about a bond b by 
an infinitesimal angle e changes a vector r by the small amount dr = eb x f where the 
cross-product b x r has the components 

3 3 

(6 X f)i = ^ ^ Sikj bk Tj (A.l) 

3=1 k=l 

in which the totally anti-symmetric tensor Sikj has elements £123 = £231 = £^312 = 1, 
£^213 = ^132 = £^321 = —1, with all other elements zero, e.g. ens = 0, etc. If we use the 
definition {Lk)ij = Sikj of the rotation generators L, then we may write the change dri 
as 

33 3 
dri = e^^ bk{Lk)ij rj = e^{b- tj^ rj (A.2) 

i=l k=l j=l 

or in matrix notation as 

dr = eb-Lr. (A.3) 

Let us use the f{6) for the vector r after a right-handed rotation by the angle 6 about 
the axis b. Then by Eq.( [A.3D the vector f{6) satisfies the differential equation 



= (A.4) 

The solution that satisfies the boundary condition r(0) = r is 

r{e) =exp{eb- L)r{0), (A.5) 
and so the matrix that represents a finite rotation by the angle 6 about the axis b is 

R{eb) =exp{eb- L). (A.6) 
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The exact form of Eg . (12.21) when the axis b does not go through the origin but 
through another point c is 



riie)-c, = Y,R^Ji^b) 
i=i 



J2 [eMOb ■ L) 



(r,-c,). (A.7) 



In our codes we used this formula with the matrix R given by 

Rij{db) = cos 9 6ij — sin 9 [^^^ ^ijk bk^ + (1 — cos 9) hi bj 
which is convenient for computation. 



(A.8) 
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